{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working with Multiple DataFrames\n",
"
\n",
"\n",
"Often you'll work with multiple dataframes that you want to stick together or merge. `df.merge()` and `df.concat()` are all you need to know for combining dataframes. The Pandas [documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html) is very helpful for these functions, but they are pretty easy to grasp.\n",
"\n",
"```{note}\n",
"The example joins shown in this section are inspired by [Chapter 15](https://stat545.com/join-cheatsheet.html) of Jenny Bryan's STAT 545 materials.\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sticking DataFrames Together with `pd.concat()`\n",
"
\n",
"You can use `pd.concat()` to stick dataframes together:\n",
"- Vertically: if they have the same **columns**, OR\n",
"- Horizontally: if they have the same **rows**"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"df1 = pd.DataFrame({'A': [1, 3, 5],\n",
" 'B': [2, 4, 6]})\n",
"df2 = pd.DataFrame({'A': [7, 9, 11],\n",
" 'B': [8, 10, 12]})"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" 1 | \n",
" 3 | \n",
" 4 | \n",
"
\n",
" \n",
" 2 | \n",
" 5 | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"0 1 2\n",
"1 3 4\n",
"2 5 6"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 7 | \n",
" 8 | \n",
"
\n",
" \n",
" 1 | \n",
" 9 | \n",
" 10 | \n",
"
\n",
" \n",
" 2 | \n",
" 11 | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"0 7 8\n",
"1 9 10\n",
"2 11 12"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" 1 | \n",
" 3 | \n",
" 4 | \n",
"
\n",
" \n",
" 2 | \n",
" 5 | \n",
" 6 | \n",
"
\n",
" \n",
" 0 | \n",
" 7 | \n",
" 8 | \n",
"
\n",
" \n",
" 1 | \n",
" 9 | \n",
" 10 | \n",
"
\n",
" \n",
" 2 | \n",
" 11 | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"0 1 2\n",
"1 3 4\n",
"2 5 6\n",
"0 7 8\n",
"1 9 10\n",
"2 11 12"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat((df1, df2), axis=0) # axis=0 specifies a vertical stick, i.e., on the columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that the indexes were simply joined together? This may or may not be what you want. To reset the index, you can specify the argument `ignore_index=True`:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" 1 | \n",
" 3 | \n",
" 4 | \n",
"
\n",
" \n",
" 2 | \n",
" 5 | \n",
" 6 | \n",
"
\n",
" \n",
" 3 | \n",
" 7 | \n",
" 8 | \n",
"
\n",
" \n",
" 4 | \n",
" 9 | \n",
" 10 | \n",
"
\n",
" \n",
" 5 | \n",
" 11 | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"0 1 2\n",
"1 3 4\n",
"2 5 6\n",
"3 7 8\n",
"4 9 10\n",
"5 11 12"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat((df1, df2), axis=0, ignore_index=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `axis=1` to stick together horizontally:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 7 | \n",
" 8 | \n",
"
\n",
" \n",
" 1 | \n",
" 3 | \n",
" 4 | \n",
" 9 | \n",
" 10 | \n",
"
\n",
" \n",
" 2 | \n",
" 5 | \n",
" 6 | \n",
" 11 | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3\n",
"0 1 2 7 8\n",
"1 3 4 9 10\n",
"2 5 6 11 12"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat((df1, df2), axis=1, ignore_index=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You are not limited to just two dataframes, you can concatenate as many as you want:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" 1 | \n",
" 3 | \n",
" 4 | \n",
"
\n",
" \n",
" 2 | \n",
" 5 | \n",
" 6 | \n",
"
\n",
" \n",
" 3 | \n",
" 7 | \n",
" 8 | \n",
"
\n",
" \n",
" 4 | \n",
" 9 | \n",
" 10 | \n",
"
\n",
" \n",
" 5 | \n",
" 11 | \n",
" 12 | \n",
"
\n",
" \n",
" 6 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" 7 | \n",
" 3 | \n",
" 4 | \n",
"
\n",
" \n",
" 8 | \n",
" 5 | \n",
" 6 | \n",
"
\n",
" \n",
" 9 | \n",
" 7 | \n",
" 8 | \n",
"
\n",
" \n",
" 10 | \n",
" 9 | \n",
" 10 | \n",
"
\n",
" \n",
" 11 | \n",
" 11 | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"0 1 2\n",
"1 3 4\n",
"2 5 6\n",
"3 7 8\n",
"4 9 10\n",
"5 11 12\n",
"6 1 2\n",
"7 3 4\n",
"8 5 6\n",
"9 7 8\n",
"10 9 10\n",
"11 11 12"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat((df1, df2, df1, df2), axis=0, ignore_index=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Joining DataFrames with `pd.merge()`\n",
"
\n",
"\n",
"`pd.merge()` gives you the ability to \"join\" dataframes using different rules (just like with SQL if you're familiar with it). You can use `df.merge()` to join dataframes based on shared `key` columns. Methods include:\n",
"- \"inner join\"\n",
"- \"outer join\"\n",
"- \"left join\"\n",
"- \"right join\"\n",
"\n",
"See this great [cheat sheet](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html#compare-with-sql-join) and [these great animations](https://github.com/gadenbuie/tidyexplain) for more insights."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df1 = pd.DataFrame({\"name\": ['Magneto', 'Storm', 'Mystique', 'Batman', 'Joker', 'Catwoman', 'Hellboy'],\n",
" 'alignment': ['bad', 'good', 'bad', 'good', 'bad', 'bad', 'good'],\n",
" 'gender': ['male', 'female', 'female', 'male', 'male', 'female', 'male'],\n",
" 'publisher': ['Marvel', 'Marvel', 'Marvel', 'DC', 'DC', 'DC', 'Dark Horse Comics']})\n",
"df2 = pd.DataFrame({'publisher': ['DC', 'Marvel', 'Image'],\n",
" 'year_founded': [1934, 1939, 1992]})"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" alignment | \n",
" gender | \n",
" publisher | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Magneto | \n",
" bad | \n",
" male | \n",
" Marvel | \n",
"
\n",
" \n",
" 1 | \n",
" Storm | \n",
" good | \n",
" female | \n",
" Marvel | \n",
"
\n",
" \n",
" 2 | \n",
" Mystique | \n",
" bad | \n",
" female | \n",
" Marvel | \n",
"
\n",
" \n",
" 3 | \n",
" Batman | \n",
" good | \n",
" male | \n",
" DC | \n",
"
\n",
" \n",
" 4 | \n",
" Joker | \n",
" bad | \n",
" male | \n",
" DC | \n",
"
\n",
" \n",
" 5 | \n",
" Catwoman | \n",
" bad | \n",
" female | \n",
" DC | \n",
"
\n",
" \n",
" 6 | \n",
" Hellboy | \n",
" good | \n",
" male | \n",
" Dark Horse Comics | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name alignment gender publisher\n",
"0 Magneto bad male Marvel\n",
"1 Storm good female Marvel\n",
"2 Mystique bad female Marvel\n",
"3 Batman good male DC\n",
"4 Joker bad male DC\n",
"5 Catwoman bad female DC\n",
"6 Hellboy good male Dark Horse Comics"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" publisher | \n",
" year_founded | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" DC | \n",
" 1934 | \n",
"
\n",
" \n",
" 1 | \n",
" Marvel | \n",
" 1939 | \n",
"
\n",
" \n",
" 2 | \n",
" Image | \n",
" 1992 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" publisher year_founded\n",
"0 DC 1934\n",
"1 Marvel 1939\n",
"2 Image 1992"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An \"inner\" join will return all rows of `df1` where matching values for \"publisher\" are found in `df2`:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" alignment | \n",
" gender | \n",
" publisher | \n",
" year_founded | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Magneto | \n",
" bad | \n",
" male | \n",
" Marvel | \n",
" 1939 | \n",
"
\n",
" \n",
" 1 | \n",
" Storm | \n",
" good | \n",
" female | \n",
" Marvel | \n",
" 1939 | \n",
"
\n",
" \n",
" 2 | \n",
" Mystique | \n",
" bad | \n",
" female | \n",
" Marvel | \n",
" 1939 | \n",
"
\n",
" \n",
" 3 | \n",
" Batman | \n",
" good | \n",
" male | \n",
" DC | \n",
" 1934 | \n",
"
\n",
" \n",
" 4 | \n",
" Joker | \n",
" bad | \n",
" male | \n",
" DC | \n",
" 1934 | \n",
"
\n",
" \n",
" 5 | \n",
" Catwoman | \n",
" bad | \n",
" female | \n",
" DC | \n",
" 1934 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name alignment gender publisher year_founded\n",
"0 Magneto bad male Marvel 1939\n",
"1 Storm good female Marvel 1939\n",
"2 Mystique bad female Marvel 1939\n",
"3 Batman good male DC 1934\n",
"4 Joker bad male DC 1934\n",
"5 Catwoman bad female DC 1934"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(df1, df2, how=\"inner\", on=\"publisher\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An \"outer\" join will return all rows of `df1` and `df2`, placing NaNs where information is unavailable:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" alignment | \n",
" gender | \n",
" publisher | \n",
" year_founded | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Magneto | \n",
" bad | \n",
" male | \n",
" Marvel | \n",
" 1939.0 | \n",
"
\n",
" \n",
" 1 | \n",
" Storm | \n",
" good | \n",
" female | \n",
" Marvel | \n",
" 1939.0 | \n",
"
\n",
" \n",
" 2 | \n",
" Mystique | \n",
" bad | \n",
" female | \n",
" Marvel | \n",
" 1939.0 | \n",
"
\n",
" \n",
" 3 | \n",
" Batman | \n",
" good | \n",
" male | \n",
" DC | \n",
" 1934.0 | \n",
"
\n",
" \n",
" 4 | \n",
" Joker | \n",
" bad | \n",
" male | \n",
" DC | \n",
" 1934.0 | \n",
"
\n",
" \n",
" 5 | \n",
" Catwoman | \n",
" bad | \n",
" female | \n",
" DC | \n",
" 1934.0 | \n",
"
\n",
" \n",
" 6 | \n",
" Hellboy | \n",
" good | \n",
" male | \n",
" Dark Horse Comics | \n",
" NaN | \n",
"
\n",
" \n",
" 7 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" Image | \n",
" 1992.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name alignment gender publisher year_founded\n",
"0 Magneto bad male Marvel 1939.0\n",
"1 Storm good female Marvel 1939.0\n",
"2 Mystique bad female Marvel 1939.0\n",
"3 Batman good male DC 1934.0\n",
"4 Joker bad male DC 1934.0\n",
"5 Catwoman bad female DC 1934.0\n",
"6 Hellboy good male Dark Horse Comics NaN\n",
"7 NaN NaN NaN Image 1992.0"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(df1, df2, how=\"outer\", on=\"publisher\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Return all rows from `df1` and all columns of `df1` and `df2`, populated where matches occur:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" alignment | \n",
" gender | \n",
" publisher | \n",
" year_founded | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Magneto | \n",
" bad | \n",
" male | \n",
" Marvel | \n",
" 1939.0 | \n",
"
\n",
" \n",
" 1 | \n",
" Storm | \n",
" good | \n",
" female | \n",
" Marvel | \n",
" 1939.0 | \n",
"
\n",
" \n",
" 2 | \n",
" Mystique | \n",
" bad | \n",
" female | \n",
" Marvel | \n",
" 1939.0 | \n",
"
\n",
" \n",
" 3 | \n",
" Batman | \n",
" good | \n",
" male | \n",
" DC | \n",
" 1934.0 | \n",
"
\n",
" \n",
" 4 | \n",
" Joker | \n",
" bad | \n",
" male | \n",
" DC | \n",
" 1934.0 | \n",
"
\n",
" \n",
" 5 | \n",
" Catwoman | \n",
" bad | \n",
" female | \n",
" DC | \n",
" 1934.0 | \n",
"
\n",
" \n",
" 6 | \n",
" Hellboy | \n",
" good | \n",
" male | \n",
" Dark Horse Comics | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name alignment gender publisher year_founded\n",
"0 Magneto bad male Marvel 1939.0\n",
"1 Storm good female Marvel 1939.0\n",
"2 Mystique bad female Marvel 1939.0\n",
"3 Batman good male DC 1934.0\n",
"4 Joker bad male DC 1934.0\n",
"5 Catwoman bad female DC 1934.0\n",
"6 Hellboy good male Dark Horse Comics NaN"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(df1, df2, how=\"left\", on=\"publisher\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" alignment | \n",
" gender | \n",
" publisher | \n",
" year_founded | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Batman | \n",
" good | \n",
" male | \n",
" DC | \n",
" 1934 | \n",
"
\n",
" \n",
" 1 | \n",
" Joker | \n",
" bad | \n",
" male | \n",
" DC | \n",
" 1934 | \n",
"
\n",
" \n",
" 2 | \n",
" Catwoman | \n",
" bad | \n",
" female | \n",
" DC | \n",
" 1934 | \n",
"
\n",
" \n",
" 3 | \n",
" Magneto | \n",
" bad | \n",
" male | \n",
" Marvel | \n",
" 1939 | \n",
"
\n",
" \n",
" 4 | \n",
" Storm | \n",
" good | \n",
" female | \n",
" Marvel | \n",
" 1939 | \n",
"
\n",
" \n",
" 5 | \n",
" Mystique | \n",
" bad | \n",
" female | \n",
" Marvel | \n",
" 1939 | \n",
"
\n",
" \n",
" 6 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" Image | \n",
" 1992 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name alignment gender publisher year_founded\n",
"0 Batman good male DC 1934\n",
"1 Joker bad male DC 1934\n",
"2 Catwoman bad female DC 1934\n",
"3 Magneto bad male Marvel 1939\n",
"4 Storm good female Marvel 1939\n",
"5 Mystique bad female Marvel 1939\n",
"6 NaN NaN NaN Image 1992"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(df1, df2, how=\"right\", on=\"publisher\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many ways to specify the `key` to join dataframes on, you can join on index values, different, column names, etc. Another helpful argument is the `indicator` argument which will add a column to the result telling you where matches were found in the dataframes:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" alignment | \n",
" gender | \n",
" publisher | \n",
" year_founded | \n",
" _merge | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Magneto | \n",
" bad | \n",
" male | \n",
" Marvel | \n",
" 1939.0 | \n",
" both | \n",
"
\n",
" \n",
" 1 | \n",
" Storm | \n",
" good | \n",
" female | \n",
" Marvel | \n",
" 1939.0 | \n",
" both | \n",
"
\n",
" \n",
" 2 | \n",
" Mystique | \n",
" bad | \n",
" female | \n",
" Marvel | \n",
" 1939.0 | \n",
" both | \n",
"
\n",
" \n",
" 3 | \n",
" Batman | \n",
" good | \n",
" male | \n",
" DC | \n",
" 1934.0 | \n",
" both | \n",
"
\n",
" \n",
" 4 | \n",
" Joker | \n",
" bad | \n",
" male | \n",
" DC | \n",
" 1934.0 | \n",
" both | \n",
"
\n",
" \n",
" 5 | \n",
" Catwoman | \n",
" bad | \n",
" female | \n",
" DC | \n",
" 1934.0 | \n",
" both | \n",
"
\n",
" \n",
" 6 | \n",
" Hellboy | \n",
" good | \n",
" male | \n",
" Dark Horse Comics | \n",
" NaN | \n",
" left_only | \n",
"
\n",
" \n",
" 7 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" Image | \n",
" 1992.0 | \n",
" right_only | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name alignment gender publisher year_founded _merge\n",
"0 Magneto bad male Marvel 1939.0 both\n",
"1 Storm good female Marvel 1939.0 both\n",
"2 Mystique bad female Marvel 1939.0 both\n",
"3 Batman good male DC 1934.0 both\n",
"4 Joker bad male DC 1934.0 both\n",
"5 Catwoman bad female DC 1934.0 both\n",
"6 Hellboy good male Dark Horse Comics NaN left_only\n",
"7 NaN NaN NaN Image 1992.0 right_only"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(df1, df2, how=\"outer\", on=\"publisher\", indicator=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By the way, you can use `pd.concat()` to do a simple \"inner\" or \"outer\" join on multiple datadrames at once. It's less flexible than merge, but can be useful sometimes."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}